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1. INTRODUCTION 

Dynamic Voltage/Frequency Scaling (DVFS) has been used to reduce power consumption of 
computing systems. DVFS is a technique that increases or decreases the supply voltage by adjusting the 
operating frequency of CMOS circuits. CMOS circuits have static and dynamic power dissipation, and 
dynamic power dissipation is the dominant component in CMOS [1]. Most research on DVFS technique has 
focused on CPU DVFS [2], [3] because the CPU is the most power-consuming device when a computer 
system is actively running. Many contemporary OSs support DVFS of CPU. Linux’s cpufreg [4] subsystem 
is an example. 

The frequency scaling technology is supported in hardware devices other than CPU such as the GPU 
or the memory bus, in such cases the operating frequency of the device can be managed by the user. For 
example, Linux system has a subsystem called devfreq to support frequency scaling of devices other than 
CPU [5]. Nexus 6 smartphone is a commercial mobile device supporting device frequency scaling, which 
allows us to adjust the clock speed of the memory bus that affects the memory bandwidth. Changing the 
frequency to access memory gives us another option to manage the power consumption of embedded 
systems. 

Attempts to manage the power consumption of memory access have been recently made. In [6], they 
proposed a DVFS method for DRAM based on memory bandwidth utilization. They devised a bandwidth- 
based frequency selection policy using their finding in experiments that memory latency is not significantly 
affected by the memory frequency at low bandwidth. But because memory hardware with DVFS support was 
not available, they emulated frequency scaling using timing delays. No DVFS is supported by DRAM so far 
because scaling of IO voltage on DRAM affects the stability and requires significant hardware change, but 
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DFS (Dynamic Frequency Scaling) is possible. In [6], low-power mode of DRAM by DFS was introduced to 
achieve energy consumption reduction with limited hardware change. The results of [7] were further 
extended to consider both CPU and DRAM power consumption in a server [8]. In [9], power management of 
DRAM using both DFS and low-power states is modeled and studied using simulation. The joint scaling of 
CPU and DRAM frequencies was also studied in [10] for server systems. Low power design of SRAM can 
be also considered as in [11]. But in this paper, we focus on DRAM power management. 

Previous works require hardware changes for memory frequency scaling to manage DRAM power 
consumption. In this letter, we propose a power management method by combining DVFS of CPU and DFS 
of memory bus. We show that CPU and memory are closely related in the view of energy efficiency, which 
depends on the number of memory access per instruction. From the relationship, we find an optimal 
frequency ratio between the CPU frequency and the memory frequency. 

The study was performed using a real device. The target device used in this study is a commercial 
smartphone, Nexus 6, which has Snapdragon 805 CPU with 3GB IpDDR3 SDRAM. The CPU frequency can 
be set to one of 18 levels (300, 422.4, 652.8, 729.6, 883.2, 960, 1036.8, 1190.4, 1267.2, 1497.6, 1574.4, 
1728, 1958.4, 2265.6, 2457.6, 2496, 2572.8, 2649.6 MHz) and the memory bus frequency can be set to one 
of 13 levels (50, 75, 100, 150, 200, 259, 307, 393, 460, 528, 662, 796, 1065 MHz). 

This paper is organized as follows. Section 2 explains the power model of the CPU and the memory 
that is used for the analysis on the relationship between the CPU and the memory frequencies in Section 3. 
The analysis in Section 3 shows the CPU frequency and the memory frequency are closely related in terms of 
energy efficiency. Based on the analysis, Section 3 presents a method for frequency selection of both the 
CPU and the memory. In Section 4, experimental results with a commercial smartphone on which our 
frequency selection method implemented are presented. Finally, Section 5 concludes our work. 


2. POWER MODEL OF CPU AND MEMORY 
The power consumption of the CPU in embedded systems is usually divided into dynamic and static 
power [12]. The power consumption of the CPU can be modeled as: 


Popu = Payn + Pstatic ~. aV?F, + IV (1) 


where a is a coefficient of the switching activity and the effective capacitance, V is the operating 
voltage, F, is the CPU frequency, and I is the leakage current. Reduction in operating voltage decreases the 
dynamic power consumption, but increases the circuit delay. The relation between the operating voltage and 
the CPU frequency is given by: 


as 2 
F, « Pta (2) 
c 


where V;, is a threshold voltage which is much smaller than the operating voltage [13], [14]. 
Equation (2) can be rewritten as F, x V, from which we can reformulate Equation (1) as: 


Pepu = Payn + Prratic © BF," + yF, (3) 


where fp is a variable depending on the switching activity and y is a hardware-dependent constant. 
For multicore CPUs, the power consumption will be given by summing of each core power, that is, Pepu ~ 
Y(BiFZ + yF;) where i represents the core number. Switching activities may differ from each other. 

On the other hand, the power consumption of the DRAM system can be divided into operation 
power and background power [6], [9]. The operation power is the power required to execute memory reads 
and writes. The background power accounts for all power consumption when there is no memory access. 
Lowering the frequency to access memory affects the power consumption; it lowers background power 
linearly [7]. The operation power is not affected by memory frequency, but the energy required for memory 
access increases because the access time becomes longer. For the DDR-series DRAMs, the background 
power is a major component in the total DRAM power consumption [6]. So we assume that the operation 
power can be ignored in our model, and the power consumption of the main memory is modeled as: 


Puem ~ PFin (4) 
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Figure 1. Power consumption of Nexus 6 in idle state with different bus frequencies 


where Fp is the memory device frequency, and p is a hardware-dependent constant. Combining 
Equations (3) and (4), we have power consumption estimation: 


P = Pepy + Pugem © BF +¥Fe+PFn (5) 


assuming the CPU and the memory are the dominating power consumption devices. 

We have measured power consumption of the target device, varying the bus frequency when the 
system is in idle state (8 = 0). Results are shown in Figure 1. Because the operating frequency of the 
SDRAM ranges from 166MHz to 800MHz, the power consumption is almost unchanged below 200MHz and 
above 796MHz. Changing the CPU frequency from the lowest level to the highest level in idle state does not 
affect the power consumption. The power consumption at the lowest bus frequency is about 0.305W and at 
the highest it is about 0.621W. The difference between the maximum and the minimum is about 0.316W and 
p ~ 0.498 when Fn is represented in GHz. 


—M =-=- 0.036x? + 0.314x + 0.305 


Lae ”] 


Figure 2. Power consumption for CPU intensive benchmark (one core) 


To obtain the values 8 and y we used cpubomb included in Isolation Benchmark Suite [15] that 
fully utilizes the CPU and does not access memory. Figure 2 shows the power consumption of cpubomb for 
different CPU frequencies when only one core is used for the benchmark. Memory bus frequency was fixed 
at the lowest level, so we assume 0.305W is consumed by memory device. Regression analysis gives us 
p = 0.036 and y ~ 0.314, which provide estimation very close to the measured ones. The hardware 
dependent parameters are used to estimate for a multicore application using the relationship Pepu ~= 
EBF +E) = 6) F2 + nyF. where n is the number of cores used. The comparison between the 
estimated values and the measured values is shown in Figure 3. The estimation error is accumulated as the 
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number of cores increases: 1.5%, 3.1%, 5.2%, and 7.0% of average error in prediction for 1, 2, 3, and 4 cores 
respectively. Because the target was overheated when multicores are used with over 2.4GHz of CPU 
frequency, we could not measure accurate values over 2.4GHz. In this letter, we simply let 6 represents }; p4. 


ZET 


Figure 3. Power estimation for multicore execution (CPU intensive) 


3. CPU AND MEMORY FREQUENCY SELECTION FOR ENERGY EFFICIENCY 

We use EDP (Energy Delay Product) [16] as the measure of energy efficiency to consider both 
turnaround time and energy consumption. The energy-delay product has been widely used as a metric to 
measure the energy efficiency coupling both the energy consumption and performance. It is the 
multiplication of the delay time (execution time) until the end of the program and the energy consumption 
during the execution of the program. Because the energy consumption of executing an instruction is the 
multiplication of the power consumption and the execution time of the instruction, the EDP of executing an 


instruction is modeled as 


2 
EDP ~ (BF;? + nyF; + pEn) x (=) = (BF, +% + 2 x (CPI)? (6) 


Fe F2 


where CPI is the Cycles Per Instruction and n is the number cores used. CPI is affected by the 
memory frequency if there is a memory access. For the RISC CPU such as ARM processor, if we let CPI be 
the CPI when there is no memory access, CPI can be estimated as 


CPI = (1+ 6° F./F,)CPlo (7) 


where 6 is 1 if there is a memory access, otherwise 0. Assuming CPJ) is a constant value, 
minimizing the EDP is equivalent to minimizing the following: 


Fm Fo \2 
F(Fe, Fn) = (BF, + +S) x (1+6 E) 


Fe 
Fim +5 Fo\ 
(ZEE) . (BR? + nyF. + pFn) (8) 
If ô = 0, we have 
Fm 
F(F., Fm) = BF, + - + a (9) 


and the optimal value of F, can be obtained when Fp is at its minimum. When 6 = 1, because 
harmonic mean is not larger than geometric mean, we have 


4 p_ a(b ny, P 
F(R, Fn) 2 p=4({ +42) (10) 


Fm Fe 
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where P is given in Equation (5). So the optimal value of F, can be found when Fp is at its 
maximum. Thus, when a is the rate of memory access per instruction, the expected F(F,, Fn) can be found by 
minimizing the following: 


(1-a) (BF, + Z+ n) + 4a (EE +4 2) 


Fe Fm m Fe 


=(1-a+4a-Z) (pF tTa) (11) 
By letting Z = R in (11), we have 


(1 — æ + 4aR) (er: +—(ny + 2)) (12) 


With CPU frequency F, given, we can calculate the frequency ratio R minimizing Equation (12) as 


_ (1-a)p 
= 4| 4a(BFZ+ny) (13) 


Figure 4. Frequency pair for optimal EDP 


Ifa = 0, R > © so Fn will be set to the minimum value. If æ > 0, the value of R can be calculated 
with a given F,, then we get the corresponding value of Fn. With limited number of CPU frequency levels, 
we can calculate the value of En for each F, with given utilization and the average memory access rate per 
instruction. Then we compare the corresponding energy consumption using Equation (11) to determine the 
pair of F, and F,, that give the minimum value. As an example, the values of F, and Ep obtained for a single 
core application are shown in Figure 4 (æ is in 0.1%-99.9%, increased by 0.1%). The highest memory bus 
frequency is used as a bound to indicate that the optimal Fn is higher than 800MHz. The results show that if 
the memory access rate is less than 0.3% we do not need to raise the memory access frequency from its 
lowest level. With less than 3.5% of the memory access rate, the frequency should be maintained below its 
highest level. 


4. APPLICATION TO A REAL TARGET 

We measured energy consumption and performance of applications on a real target (Nexus 6) to 
validate our analysis. To measure the power and energy consumption, we disassembled the battery parts of 
Nexus 6 and connected the charging port to a digital power meter (OQDROID Smart Power was used) which 
supports 10Hz sampling rate. We tested three benchmarks: cpubomb, ramsmp [17], and STREAM [18]. 
ramsmp and STREAM have 4 kinds of operations: copy, scale, add, and triad. Copy moves data in an array to 
another. Scale multiplies a value to data from an array and stores it to another. Add adds data from two arrays 
then stores the sum to the other array. Triad combines scale and add. Operations of ramsmp were tested 
separately, but those of STREAM were tested all together for comparison. The ranges of memory access rate 
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a for benchmarks are: 0.276-0.702 for copy, 0.016-0.023 for scale, 0.06~0.07 for add, 0.016-0.023 for triad 
of ramsmp, 0.001 for cpubomb, and 0.14-0.70 for STREAM. The CPU utilization is 100% for all benchmarks. 

We compared the energy efficiency of different governors of Linux with the presented method. Our 
method was implemented using the governor interface of Linux and the sampling rate of our policy is the 
same as other governors. Linux supports 3 dynamic policies for CPU DVFS: conservative, ondemand, and 
interactive. The default CPU DVFS policy for Nexus 6 device is the interactive governor, which is typical 
for Android devices. The governor for the memory bus is cpubw_hwmon,; it monitors the memory reads and 
writes and adjust the bus frequency according to the memory bandwidth. Note that our method has an 
integrated governor that performs both CPU DVFS and control of memory bus frequency simultaneously. 
Figure 5 compares the EDP of benchmarks. 
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Figure 5. EDP of benchmarks 


With ramsmp, frequency scaling of CPU and memory based on our analysis shows lowest EDP 
value in this experiment. EDP value is enhanced about 8.6% for copy operation and about 3.3 % for triad 
operation over the default governor. The energy efficiency was enhanced about 3.4% over interactive 
governor and 9.6% over conservative governor in total operations of ramsmp. Test with STREAM benchmark 
shows similar result: enhanced 7.6% over interactive and 11.7% over conservative governor. If memory is 
barely accessed, the proposed method does not degrade performance as in the results with cpubomb. 


5. CONCLUSION 

Although the CPU is the most power-consuming device in a computer system, memory also has the 
significant effects on power consumption as well as performance. Because of its impact on the performance, 
the memory is important especially in terms of energy efficiency. Thus frequency selection of CPU without 
considering the memory access could fail in optimizing the energy efficiency of the system. In this paper, we 
have analyzed the relationship between CPU and memory frequency in the view of energy efficiency. For 
CPU-intensive applications, lowering memory access frequency can reduce the power consumption of the 
system. For applications with considerable memory access, proper selection of CPU and memory frequency 
is needed. We presented a model for selection, and it was tested on a real target (Nexus 6 smartphone). The 
results show frequency assignments based on our analysis enhances energy efficiency. 
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